Machine Learning II Exam

Team Members:

INSURANCE FRAUD DETECTION USING MACHINE LEARNING

WHAT IS INSURANCE FRAUD ?

WHY INSURANCE FRAUD?

PROBLEM STATEMENT:

DATASET:

Insurance Fraud Detection

1.Import necessary packages

2.Data Preprocessing

1. Finding the missing value

2. Null Imputation

3. Outlier Analysis

3.Exploratory Data Analysis

1.Univariate Analysis

2. Bivariate Analysis

3. Multivariate Analysis

1.Univariate Analysis

correlation

3. Multivariate Analysis</p>

Checking for Outliers

4 Hypothesis Testing

Null Hypothesis (HO) - There is no relation between outcome variable and auto_model categorical variable

Alternative Hypothesis (H1) - There is relation between outcome variable and auto_model categorical variable

5. Feature Engineering

1. Data Encoding

2. Data Scaling

3. SMOTE Analysis

1. Data Encoding

2. Data Scaling

3. SMOTE Analysis

*italicised text* 6.Modelling

1. Logistic Regression

2. RandomForestClassifier

3. XGBoostClassifier

1. Logistic Regression

2. RandomForestClassifier

Classification report for randomforestclassifier model

After parameter tunning RandomForestClassifer

3. XGBoostClassifier Before Parameter Tunning

Classification report before paramter tunning

XGBoostClassifier After Paramter Tunning

Classification report after paramter tunning italicised text

References

  1. https://seaborn.pydata.org/tutorial/categorical.html [visualization]
  2. https://plotly.com/python/v4-migration/ [interactive visualization] 2.1 chi square Test-feature selection [ ALEngineering channel]
  3. https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28 [ Handling imbalance dataset]
    3.1 https://www.analyticsvidhya.com/blog/2017/03/imbalanced-data-classification/ [handling imbalance data] 3.2 https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/ [ handling imbalanced data]
  4. https://www.youtube.com/watch?v=fxw_Ak4t-LY [types of encoding]

    4.1 https://www.coursera.org/lecture/competitive-data-science/concept-of-mean-encoding-b5Gxv [ mean encoding concept]

  5. https://machinelearningmastery.com/threshold-moving-for-imbalanced-classification/ [threshold value] 5.1 https://www.youtube.com/watch?v=_AjhdXuXEDE [finding optimal

     threshold  for classification]
    

    5.1 https://numpy.org/doc/stable/reference/generated/numpy.linspace.html [ numpy package for computation ] 5.3 https://www.kaggle.com/nirajvermafcb/comparing-various-ml-models-roc-curve-comparison

    6.https://towardsdatascience.com/various-ways-to-evaluate-a-machine-learning-models-performance-230449055f15

      [model evaluation metrics]
    
    
    1. Robust logistic regression for insurance risk classification (repec.org) 8.Vehicle insurance — Random forest classifier | Aviral Bhardwaj | Medium | Medium Accuracy vs. F1-Score. A comparison between Accuracy and… | by Purva Huilgol | Analytics Vidhya | Medium </pre>